Problem Set 3 - Solutions

library(tidyverse)
library(sf)
library(tmap)
library(ggalluvial)
library(ggHoriPlot)
my_theme <- theme_bw() +
  theme(
    panel.background = element_rect(fill = "#f7f7f7"),
    panel.grid.minor = element_blank(),
    axis.ticks = element_blank(),
    plot.background = element_rect(fill = "transparent", colour = NA)
  )
theme_set(my_theme)

Glacial Lakes

Scoring

  • a, Accuracy and Code (1 point): Correct numbers and concise implementation (1 point), correct numbers but unnecessarily complex implementation (0.5 points), incorrect numbers (0 points).
  • b and c, Design (0.5 points each): Creative and readable design (0.5 points), generally appropriate but lacking critical attention (0.5 points), difficult to read (0 points).
  • b and c, Code (0.5 points each): Clear and concise (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points)

Question

The data at this link contain labels of glacial lakes the Hindu Kush Himalaya, created during an ecological survey in 2015 by the International Centre for Integrated Mountain Development.

Example Solution

  1. How many lakes are in this dataset? What are the latitude / longitude coordinates of the largest lakes in each Sub-basin?
lakes <- read_sf("https://github.com/krisrs1128/stat679_code/raw/main/activities/week8/GL_3basins_2015.geojson")
top_lakes <- lakes %>%
  group_by(Sub_Basin) %>%
  slice_max(Area) %>%
  mutate(Sub_Basin = fct_reorder(Sub_Basin, -Area))

There are 3624 lakes in this datasets. The coordinates of the top lakes are printed below. The fct_reorder step in the line above ensures that in our later faceted views, the lakes appear in order of largest to smallest.

top_lakes
## Simple feature collection with 20 features and 10 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 80.17421 ymin: 27.71149 xmax: 87.87144 ymax: 30.56714
## Geodetic CRS:  WGS 84
## # A tibble: 20 × 11
## # Groups:   Sub_Basin [20]
##       Id Latitude Longitude GL_ID     Basin Sub_B…¹   Area Eleva…² Type  Country
##  * <int>    <dbl>     <dbl> <chr>     <chr> <fct>    <dbl>   <int> <chr> <chr>  
##  1    29     28.4      86.3 GL086304… Koshi Arun    4.02      5344 M(e)  China  
##  2   186     29.2      82.9 GL082948… Karn… Bheri   4.92      3612 O     Nepal  
##  3   237     28.6      84.6 GL084628… Gand… Budhi … 0.267     3628 M(e)  Nepal  
##  4    92     27.9      86.9 GL086925… Koshi Dudh K… 1.34      5002 M(e)  Nepal  
##  5   778     30.1      81.8 GL081780… Karn… Humla   0.757     5014 M(e)  Nepal  
##  6   596     28.0      85.7 GL085717… Koshi Indraw… 0.0269    4061 E(o)  Nepal  
##  7  1120     30.6      80.2 GL080178… Karn… Kali    0.240     4864 M(o)  India  
##  8   426     29.2      83.7 GL083701… Gand… Kali G… 0.434     5426 M(l)  Nepal  
##  9   333     29.6      81.6 GL081554… Karn… Karnali 0.126     4550 E(c)  Nepal  
## 10   585     29.9      81.6 GL081577… Karn… Kawari  0.284     3566 M(e)  Nepal  
## 11   966     27.7      86.5 GL086542… Koshi Likhu   0.0903    4938 M(e)  Nepal  
## 12   291     28.7      83.9 GL083851… Gand… Marsya… 3.38      4910 M(e)  Nepal  
## 13   456     29.8      82.4 GL082414… Karn… Mugu    0.428     4692 M(e)  Nepal  
## 14   178     28.4      84.1 GL084116… Gand… Seti    0.102     2444 O     Nepal  
## 15    36     28.3      85.8 GL085838… Koshi Sun Ko… 5.41      5067 M(e)  China  
## 16   240     27.9      86.4 GL086447… Koshi Tama K… 1.72      5042 M(e)  China  
## 17   220     27.9      87.9 GL087866… Koshi Tamor   0.696     4910 M(e)  Nepal  
## 18   242     29.4      82.4 GL082423… Karn… Tila    0.432     4424 E(c)  Nepal  
## 19   186     28.5      85.5 GL085519… Gand… Trishu… 0.454     4445 M(e)  China  
## 20   470     29.8      81.5 GL081526… Karn… West S… 0.508     4573 M(e)  Nepal  
## # … with 1 more variable: geometry <POLYGON [°]>, and abbreviated variable
## #   names ¹​Sub_Basin, ²​Elevation
  1. Plot the polygons associated with each of the lakes identified in step (a). Hint: You may find it useful to split lakes across panels using the tmap_facets function. If you use this approach, make sure to include a scale with tm_scale_bar(), so that it is clear that lakes have different sizes.

We can generate this display by combining the tm_polygons and tm_facets functions. By setting nrow = 5, we can make most efficient use of the space available (there is no blank space caused by the facets wrapping around).

tm_shape(top_lakes) +
  tm_polygons(col = "#494FBF") +
  tm_facets(by = "Sub_Basin", nrow = 5) +
  tm_scale_bar()

  1. Visualize all lakes with latitude between 28.2 and 28.4 and with longitude between 85.8 and 86. Optionally, add a basemap associated with each lake.

We can generate this figure by filtering the lakes dataset and calling tm_polygons() again. The tm_options and tm_mode calls allow us to overlay the polygons on a satellite imagery basemap of the region.

tmap_options(basemaps = c(Canvas = "Esri.WorldImagery"))
tmap_mode("view")

lakes %>%
  filter(Latitude > 28.2, Latitude < 28.4, Longitude > 85.8, Longitude < 86) %>%
  tm_shape() +
  tm_polygons(col = "#494FBF")

Australian Pharmaceuticals

Grading

  • a and b, Design (0.5 points each): Creative and readable design (0.5 points), generally appropriate but lacking critical attention (0.5 points), difficult to read (0 points).
  • a and b, Code (0.5 points each): Clear and concise (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points)
  • c, Discussion (1 point): Correct and well-developed interpretation (1 point), correct but somewhat underdeveloped (0.5 points), missing or incorrect interpretations (0 points).

Question

The code below takes the full PBS dataset from the previous problem and filters down to the 10 most commonly prescribed pharmaceutical types. This problem will ask you to implement and compare two approaches to visualizing this dataset.

pbs_full <- read_csv("https://uwmadison.box.com/shared/static/fcy9q1uleqru7gcs287q903y0rcnw2a2.csv") %>%
    mutate(Month = as.Date(Month))

top_atcs <- pbs_full %>%
  group_by(ATC2_desc) %>%
  summarise(total = sum(Scripts)) %>%
  slice_max(total, n = 10) %>%
  pull(ATC2_desc)

pbs <- pbs_full %>%
  filter(ATC2_desc %in% top_atcs, Month > "2007-01-01")

Example Solution

  1. Implement a stacked area visualization of these data.

We can use geom_area to generate this type of view. We have customized the view to make it more useful,

  • The gaps along the x and y directions have been removed by setting the expand arguments of the scales.
  • The \(y\)-axis labels are given in millions, rather than the original scientific notation.
  • The colors palette has been customized, and its label is more informative than the default ATC_desc.
  • We have moved the legend to the bottom, since it otherwise forces the series to be compressed.
  • We’ve sorted the stacks so that the most commonly prescribed medications appear at the bottom.
pbs <-  pbs %>%
  mutate(ATC2_desc = fct_reorder(ATC2_desc, Scripts))

ggplot(pbs) +
  geom_area(aes(Month, Scripts, fill = ATC2_desc)) +
  scale_fill_brewer(palette = "Set3") +
  scale_y_continuous(expand = c(0, 0, .1, 0), labels = scales::label_number_si()) +
  scale_x_date(expand = c(0, 0, 0, 0)) +
  guides(fill=guide_legend(nrow = 5, byrow = TRUE)) +
  labs(fill = "Prescription") +
  theme(legend.position = "bottom")

  1. Implement an alluvial visualization of these data.

We can use geom_alluvium from the ggalluvial to generate this type of figure. We have used the same style customizations as in the geom_area plot above. Note that we need to set both the fill and color if we want to modify the palette, since the streams have both a fill and a border color.

ggplot(pbs) +
  geom_alluvium(
    aes(Month, Scripts, fill = ATC2_desc, col = ATC2_desc, alluvium = ATC2_desc), 
    alpha = 0.9, decreasing = FALSE
  ) +
  scale_fill_brewer(palette = "Set3") +
  scale_color_brewer(palette = "Set3", guide = "None") +
  scale_y_continuous(expand = c(0, 0, .1, 0), labels = scales::label_number_si()) +
  scale_x_date(expand = c(0, 0, 0, 0)) +
  guides(
    fill = guide_legend(nrow = 5, byrow = TRUE),
    color = guide_legend(nrow = 5, byrow = TRUE)
    ) +
  labs(fill = "Prescription", col = "Prescription") +
  theme(legend.position = "bottom")

  1. Compare and contrast the strengths and weaknesses of these two visualization strategies. Which user queries are easier to answer using one approach vs. the other?
  • Ranking comparisons. Rankings are much easier to answer using the alluvium plot. Though we can roughly tell that Agents Acting on Renin-Angiotensis System was most common prescription in the middle range of the time series, it’s not at all clear when its growth / decline took place. Rankings for the rarer prescriptions are essentially impossible to make without the alluvium. The view also highlights large shifts in rankings (e.g., Drugs for Acid Related Disorders) that were not at all obvious in the original view.

  • Trends in totals. It is easier to evaluate trends in total prescription amount using the stacked area view, compared to the alluvium. Though we can tell from both views that total prescriptions increased around summer 2007, the smoothing and gaps in the alluvium make the rapidity of the shift less obvious.

  • Per-prescription trends. Comparing the width of one stream at two points in the alluvium plot is complicated. It requires remembering the width at one point and then searching for where the stream has gone at another timepoint. In contrast, for the area plot, changes in width are easy to make, because the same drug appears at approximately the same \(y\) value throughout.

Spotify Time Series

Grading

  • a and b, Design (0.5 points each): Creative and readable design (0.5 points), generally appropriate but lacking critical attention (0.5 points), difficult to read (0 points).
  • a and b, Code (0.5 points each): Clear and concise (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points)
  • c, Discussion (1 point): Correct and well-developed interpretation (1 point), correct but somewhat underdeveloped (0.5 points), missing or incorrect interpretations (0 points).

Question

The code below provides the number of Spotify streams for the 40 tracks with the highest stream count in the Spotify 100 dataset for 2017. This problem will ask you to explore a few different strategies that can be used to visualize this time series collection.

spotify_full <- read_csv("https://uwmadison.box.com/shared/static/xj4vupjbicw6c8tbhuynw0pll6yh1w0d.csv")
top_songs <- spotify_full %>%
  group_by(track_name) %>%
  summarise(total = sum(streams)) %>%
  slice_max(total, n = 40) %>%
  pull(track_name)

spotify <- spotify_full %>%
  filter(region == "global", track_name %in% top_songs)

Example Solution

  1. Design and implement a line-based visualization of these data.

We’ve plotted each series using geom_line in the display below. Note that there were two tracks with the same track name (but different artists), so we have to group the lines according to all artist * track_name combinations. We have customized the line, axes, and label appearances. Though there are too many lines on the display for it to be possible to label them all, we can see some of the common structure across tracks. Most have a quick rise and gradual decline. All have strong cyclic (weekly) structure. Some have spikes in popularity in the middle of their run among the most streamed songs.

ggplot(spotify) +
  geom_line(
    aes(date, streams, group = interaction(artist, track_name)),
    linewidth = 0.4, col = "#808080", alpha = 0.8
  ) +
  scale_x_date(expand = c(0, 0, 0, 0)) +
  scale_y_continuous(expand = c(0, 0, .1, 0), labels = scales::label_number_si()) +
  labs(
    x = "Date", 
    y = "Number of Streams"
  )

  1. Design and implement a horizon plot visualization of these data.

We can use geom_horizon from the ggHoriPlot package to make this horizon plot. We have sorted the tracks according to the median number of streams it had over the course of its run. We have also shortened the track names so that they don’t take up too much space. The main advantage of this view is that it allow us to associate song names with the observed data. This is helpful for answering questions like, which was the most streamed song (Shape of You), what was the song with the sudden spike in popularity (Despacito), and which songs had a second wave of popularity after an initial decline (1-800-273-8255, Me Rehuso, Believer, …). The cost is that it it’s much harder to recognize the large range in popularity across tracks – Shape of You and Paris have similar looking trends, though they have very different total number of streams.

spotify %>%
  mutate(track_name = str_sub(track_name, 1, 25)) %>%
  ggplot() +
  geom_horizon(aes(date, streams)) +
  facet_grid(reorder(track_name, -streams, median) ~ .) +
  scale_x_date(expand = c(0, 0, 0, 0)) +
  scale_fill_hcl(palette = "Tropic", guide = FALSE, reverse = TRUE) +
  theme(
    axis.text.y = element_blank(),
    strip.text.y = element_text(angle = 0)
  )

  1. Building from the static views from (a - b), propose, but do not implement, a visualization of this dataset that makes use of dynamic queries. What would be the structure of interaction, and how would the display update when the user provides a cue? Explain how you would use one type of D3 selection or mouse event to implement this hypothetical interactivity.

We could imagine linking the line and horizon plots to try to get the best of both worlds. Here are some possibilities,

  • We could place the horizon and line plot side-by-side, with all the horizons visible clearly but the lines shown only faintly. Clicking on a facet from the horizon plot would highlight the associated lin in the line plot (we would also want to draw a border on the facet showing that it is selected). Selecting several of the tracks in the horizon would allow comparison between tracks on an absolute scale. This idea could be implemented by adding a click event listener on each rectangle defining the background for the horizon plot. By bind the rectangles with the track names, we could look up the associated track whenever the event triggers.

  • A less obvious form of interaction would be to use the lineplot to update the parameters of the horizon plot. This is one of the ideas discussed in the paper “Interactive Horizon Graphs: Improving the compact visualization of multiple time series” 10.1145/2470654.2466441. We could click and drag a horizontal line on the line graph to interactively change the baseline in the horizon view (the number at which blue transitions into purple). This would require a d3.drag() object, which would be called on the path defining the horizontal line.

Alternatively (or in addition), we could introduce a slider to adapt the number of breaks in the horizon color palette. We could even allow the user to adjust the absolute \(y\) values at which the breaks are given by referring to the \(y\)-axis in the line plot. Small rectangles could be placed along the \(y\)-axis to show the color associated with that range of values. One way to implement this would be to append a collection of brushes along the \(y\)-axis, each colored in by its range’s value. Dragging the endpoint for one brush would also change the endpoint of its neighbor, so that the \(y\)-axis is always partitioned into regions.

CalFresh Enrollment II

Grading

  • a and b, Code (0.5 points each): Clear and concise (0.5 points), correct but unnecessarily complex (0.25 points), missing (0 points)

  • a and b, Design (0.5 points each): Creative and readable design (0.5 points), generally appropriate but lacking critical attention (0.5 points), difficult to read (0 points).

  • c, Discussion (1 point): Creative and well-developed proposal (1 point), appropriate but somewhat underdeveloped (0.5 points), missing or inappropriate interpretations (0 points).

  • d, Design and Code (1 point): Creative design and readable code (1 point), generally appropriate but lacking some attention to detail (0.5 points), missing or only partially complete (0 points).

Question

In this problem, we will develop an interactively linked spatial and temporal visualization of enrollment data from CalFresh, a nutritional assistance program in California. We will use D3 in this problem.

Example Solution

  1. Using a line generator, create a static line plot that visualizes change in enrollment for every county in California.

The main function used to draw the line plot is shown below. We use line_gen to map dates to calfresh enrollment (square root transformed to help legibility of smaller counts, though transformation was not necessary to receive full credit). This function assumes that ts has already been reshaped into an array of arrays (one internal array for each county). THe d3.select("#ts") block appends a line for each county, and the steps below define the \(x\) and \(y\) axes based on scales that were previously defined. The full implementation can be read here.

function line_plot(ts, scales) {
  let line_gen = d3.line()
    .x(d => scales.x(d.date))
    .y(d => scales.y(Math.sqrt(d.calfresh)))

  d3.select("#ts")
    .selectAll("path")
    .data(ts).enter()
    .append("path")
    .attr("d", line_gen)

  let axis = {
    x: d3.axisBottom(scales.x),
    y: d3.axisLeft(scales.y).ticks(4)
  }

  d3.select("#x_axis").call(axis.x)
  d3.select("#y_axis").call(axis.y)
}

The final figure looks like this:

  1. On the same page as part (a), create a choropleth map of California, shaded in by the average enrollment for that county over the full time window of the dataset.

The function below draws the required choropleth. geo and ts are objects containing the geojson and time series information, respectively. The first part of the function defines the projection, geographic path generator, and a summary of county-level enrollments. The block below this appends the county borders to create the required map with counties filled in by their average enrollment.

function choropleth(geo, ts) {
  let proj = d3.geoMercator()
    .fitSize([width / 2, height], geo)
  let path = d3.geoPath()
    .projection(proj)
  let means = county_means(ts)

  d3.select("#map")
    .selectAll("path")
    .data(geo.features).enter()
    .append("path")
    .attrs({
      d: path,
      fill: d => scales.fill(means[d.properties.county])
    })
}

To generate the means object with county-level enrollments, we had used the function below. It loops over every function in the array and takes an average, keeping the county name in the object’s keys for quick reference later.

function county_means(ts) {
  let means = {}
  for (let i = 0; i < ts.length; i++) {
    means[ts[i][0].county] = d3.mean(ts[i].map(d => Math.sqrt(d.calfresh)))
  }

  return means;
}
  1. Propose one interactive, graphical query the links the combined spatial + temporal view from (a - b). For example, you may consider highlighting time series when a county is hovered, highlighting counties when specific series are brushed, or updating choropleth colors when a subsetted time range is selected.

There are many possible solutions to this part of the problem. We will show a variation of one presented in the problem statement: hovering a county in the choropleth will show just the one time series associated with that county. Moving from one county to another, the shape of the line will smoothly deform into the new county’s shape. The minimum and maximum \(y\)-axis values will also be adjusted, so that the currently focused county’s time series takes up all available space.

The advantage of this approach is that it allows easy comparison of the differences in trend / shape across counties. Even if two counties have very different absolute enrollment numbers, their trends may be similar. This can be hard to notice without zooming into just the county of interest. Considering that shapes tend to be similar for neighboring counties, the proposed transitions tend to be easy to follow (and those which are not smooth stand out as interesting discrepancies). The downside of this approach is that, in addition to de-emphasizing the difference in absolute enrollments, it relies on memory to make the comparison between counties. This can be especially hard if the counties are not immediately adjacent to one another.

  1. Implement your proposal from part (c).

The final implementation is shown below, and the full code can be read here. We will walk through the essential elements below, but first, let’s study some of takeaways. Los Angeles has by far the largest enrollment. Monterrey, and to a lesser extent, Santa Cruz and Santa Barbara, counties show strong cyclic patterns in enrollment. Counties in Central California (e.g., San Joaquin and Merced) show a steady decline followed by a dramatic peak. Marin and Sonoma counties seem to be missing data for one timepoint; oddly enough, they are neighboring counties.

The main structure of the implementation is contained in the following function, which is called everytime a choropleth path is moused over.

function update_display(ev, d, ts) {
  update_label(d)
  update_map(d)
  update_ts(ev, d, ts)
}

update_label and update_map are relatively straightforward, since they simply change properties of elements that were already drawn. update_ts is somewhat more complex, because it must adapt the \(y\)-scale to the currently drawn time series. Its implementation is given below. The first block filters down to the current series based on the mouseover’d map path. It also modifies the y scale so that the current series’ minimum and maximum takes up all the space available. We can define a new line generator based on this scale. Using the filtered data and updated line generator, we can redraw the series, with a smooth transition in between – this is the content of the block starting with d3.select("#ts"). The final block updates the \(y\)-axis to reflect the new scale, making sure the transition exactly matches the change in line heights.

function update_ts(ev, d, ts) {
  cur_ts = ts.filter(e => e[0].county == d.properties.county)
  scales = make_scales(cur_ts)

  let line_gen = d3.line()
    .x(d => scales.x(d.date))
    .y(d => scales.y(Math.sqrt(d.calfresh)))

  d3.select("#ts")
    .selectAll("path")
    .data(cur_ts)
    .transition().duration(transition_length)
    .attrs({
      d: line_gen,
      "stroke-opacity": 1
    })

  d3.select("#y_axis")
    .transition().duration(transition_length)
    .call(d3.axisLeft(scales.y).ticks(4))
}

Temporal and Geospatial Commands

Grading

  • a - d, Discussion for role (2/3 points each): Complete and accurate description of function (2/3 points), generally correct but either vague or copied from notes (1/3 points), incorrect or incomplete (0 points)
  • a - d, Discussion for example (2/3 points each): Creative and clearly explained scenario (2/3 points), correct but vague or copied from notes (1/3 points), inaccurate or underdeveloped example (0 points).

Question

For each of the commands below, explain what role it serves in temporal or spatial data visualization. Describe a specific (if hypothetical) situation within which you could imagine using the function.

Example Solution

  1. geom_stream
  • Role: This can be used to add a stream graph layer in a visualization of evolving group totals over time.
  • Example: A stream graph would be appropriate if we wanted to visualize the total revenue across movie genres over time. Each genre corresponds to a category, and its stream describes how the revenue for that genre evolves over time. Since these totals should only change gradually, the smoothing automatically imposed by the stream graph would lead to a more aesthetically pleasing visualization than a plain stacked area plot.
  1. tm_lines
  • Role: This can be used to add a layer of line vector data to a tmap object.
  • Example: Vector line data are useful whenever we have data that are measured along a trajectory (and which do not loop back to form a polygon), and tm_lines helps to add these to a base tm_shapes() object. For example, we might have data on the path followed by particles through an accelerator. Though this is not traditionally georeferenced data, we can still study the trajectories by encoding them as vector line data and drawing them with tmap.
  1. rast
  • Role: This can be used to read in a .tiff file as a spatial raster object for downstream calculation or visualization.

  • Example: Raster data are often used to store meteorological, elevation, spectral or satellite imagery information that have been measured along an even grid. For example, we might have estimated soil carbon for each 1km x 1km block across the United States, and this function would allow us to read it in to answer questions like, what is the strength of spatial correlation in soil carbon?

  1. geom_horizon
  • Role: This can be used to add a horizon plot layer to a multiple time series visualization.

  • Example: Horizon plots are useful whenever we have a large collection of time series. For example, suppose we have conducted a topic analysis of the neuroscience literature. We might be interested in seeing which topics (e.g., anatomy, disease, plasticity, …) have become more or less common over time. Since we may need a large collection of topics before the model fits well, a horizon plot would be useful.

Visualization for Everyday Decisions

Grading

  • a - b, Discussion (0.75 points each): Well-developed and accurate descriptions (0.75 point), correct but somewhat underdeveloped (0.5 points), missing discussion (0 points).
  • c, Design (0.75 points): Creative and readable design (0.5 points), generally appropriate but lacking critical attention (0.5 points), difficult to read (0 points).
  • c, Discussion (0.75 points): Well-developed and accurate descriptions (0.75 point), correct but somewhat underdeveloped (0.5 points), missing discussion (0 points).

Question

We routinely encounter spatial and temporal data in everyday life, from the from the dates on a concert calendar to the layout of buttons in an elevator. This problem asks you to critically reflect on the way these data are represented.

Example Solution

  1. Describe one type of spatial or temporal dataset (loosely defined) that you encounter in a nonacademic context. What visual encodings are used? Does it implement any interactivity?

There are many possible correct answers for this problem. Here is one example: If you wanted to purchase a ticket to an SF symphony concert, you will be shown an interactive seating chart. Each polygon matches one part of Davies symphony hall,

Hovering over different seating areas highlights the price ranges in the right hand panel (conversely, hovering over the price ranges shows the associated seating area), an example of dynamic linking. Moreover, when you click on a seating area, you can see the individual seats – this zooming in and out based on clicking is an example of the overview + detail principle.

Each circle encodes one seat. If the circle is filled green, then that seat has already been taken. Blue seats encode wheelchair accessible / wheelchair companion seats. If it is empty, then hovering over it will reveal a tooltip with the seat number and price. This view can also be panned and zoomed. In addition to the seats, the zoomed in views show the row and box names, box orientations, and stair locations.

The small scented widget at the top left hand corner lets you click to switch between seating areas. Hovering over a seating area in this widget for a few seconds reveals a tooltip with the area’s name. This mini-visualization is much easier than having a dropdown menu with terms like “Loge” and “1st Tier,” which might not be obvious.

  1. What questions does the visualization help you answer? How easily can you arrive at an accurate answer?

This different views within this visualization help answer a range of questions,

  • The overall linked view helps answer how proximity to the stage is related to minimum and maximum seat price ranges. It is easy to answer these questions precisely, since the exact prices are printed in a table (not encoded using any mark). There are also not so many seating areas that it is hard to navigate across all of them.
  • The seating-area view lets us see the distribution of seating availabilities. It also allows us to query the prices of individual seats within the overall area. The view seems better suited for answering questions about availability than prices – to look up prices, we have to hover over individual seats, but availability can be determined at a glance.
  • Panning and zooming in the seating-area view makes it easier to make local calculations about individual seats. For example, how many seats are in that row? How many of the seats in front of it are already taken? Zooming in can help with this.
  1. In the spirit of the re-imagined parking sign project, propose an alternative representation of these data (or an alternative way of interacting with the same representation). Why did you propose the design that you did? What advantages do you think it has?

This is already a quite sophisticated view, but we can imagine a few variations that prioritize different queries,

  • If we want to support more efficient price lookups, we could color seats in according to their prices, in addition to their availabilities. For example we could use a white-green color gradient to show the range in prices across seats within a seating area and black to say that the seat is taken.
  • Similarly, we could include a brushable histogram of all seat prices in the whole concert hall. Brushing over the histogram could update the fill color in the overview / scented widget polygons to highlight those seating areas with the most available seating within the specified price ranges.
  • Perhaps we are interested in the view associated with each seat. If such image data were available, we could imagine revealing it whenever the user mouseovers a seat. We could even imagine having an audio sample from a microphone at that seat, so we can ask how the instrumental balance varies across the hall.
  • What if we were looking for \(K\) seats that are all together? We could imagine a slider that filters to only those seats that would allow a party of \(K\) people to sit adjacent to one another.